Using kNN Model-based Approach for Automatic Text Categorization
نویسندگان
چکیده
An investigation has been conducted on two well known similarity-based learning approaches to text categorization: the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNNModel) has been proposed. It combines the strength of both k-NN and Rocchio. A text categorization prototype system has been presented. It implements kNNModel along with kNN, Rocchio and Support Vector Machine (SVM). An evaluation has been carried out on two common document corpora, namely, the 20newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN and Rocchio classifiers, and is comparable to SVM, which is used as a benchmark in our experiments.
منابع مشابه
An kNN Model-Based Approach and Its Application in Text Categorization
An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting t...
متن کاملAutomated multi-label text categorization with VG-RAM weightless neural networks
In automated multi-label text categorization, an automatic categorization system should output a label set, whose size is unknown a priori, for each document under analysis. Many machine learning techniques have been used for building such automatic text categorization systems. In this paper, we examine virtual generalizing random access memory weightless neural networks (VG-RAM WNN), an effect...
متن کاملKNN based Machine Learning Approach for Text and Document Mining
Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a single-label classification task; otherwise, it is a multi-label classification task. TC uses several tools from Information Retrieval (IR) and Machine Learni...
متن کاملText Categorization for Authorship based on the Features of Lingual Conceptual Expression
The text categorization is an important field for the automatic text information processing. Moreover, the authorship identification of a text can be treated as a special text categorization. This paper adopts the conceptual primitives’ expression based on the Hierarchical Network of Concepts (HNC) theory, which can describe the words meaning in hierarchical symbols, in order to avoid the spars...
متن کاملSvm Based Improvement in Knn for Text Categorization
ABSTRACTIn today‟s library science, information and computer science, online text classification or text categorization is a huge complication. [1]With the enormous growth of online information and data, text categorization has become one of the crucial techniques for handling and standardizing text data. Various learning algorithms have been applied on text for categorization. On the basis of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003